231 research outputs found
Adversarial Sets for Regularising Neural Link Predictors
In adversarial training, a set of models learn together by pursuing competing
goals, usually defined on single data instances. However, in relational
learning and other non-i.i.d domains, goals can also be defined over sets of
instances. For example, a link predictor for the is-a relation needs to be
consistent with the transitivity property: if is-a(x_1, x_2) and is-a(x_2, x_3)
hold, is-a(x_1, x_3) needs to hold as well. Here we use such assumptions for
deriving an inconsistency loss, measuring the degree to which the model
violates the assumptions on an adversarially-generated set of examples. The
training objective is defined as a minimax problem, where an adversary finds
the most offending adversarial examples by maximising the inconsistency loss,
and the model is trained by jointly minimising a supervised loss and the
inconsistency loss on the adversarial examples. This yields the first method
that can use function-free Horn clauses (as in Datalog) to regularise any
neural link predictor, with complexity independent of the domain size. We show
that for several link prediction models, the optimisation problem faced by the
adversary has efficient closed-form solutions. Experiments on link prediction
benchmarks indicate that given suitable prior knowledge, our method can
significantly improve neural link predictors on all relevant metrics.Comment: Proceedings of the 33rd Conference on Uncertainty in Artificial
Intelligence (UAI), 201
Convolutional 2D Knowledge Graph Embeddings
Link prediction for knowledge graphs is the task of predicting missing
relationships between entities. Previous work on link prediction has focused on
shallow, fast models which can scale to large knowledge graphs. However, these
models learn less expressive features than deep, multi-layer models -- which
potentially limits performance. In this work, we introduce ConvE, a multi-layer
convolutional network model for link prediction, and report state-of-the-art
results for several established datasets. We also show that the model is highly
parameter efficient, yielding the same performance as DistMult and R-GCN with
8x and 17x fewer parameters. Analysis of our model suggests that it is
particularly effective at modelling nodes with high indegree -- which are
common in highly-connected, complex knowledge graphs such as Freebase and
YAGO3. In addition, it has been noted that the WN18 and FB15k datasets suffer
from test set leakage, due to inverse relations from the training set being
present in the test set -- however, the extent of this issue has so far not
been quantified. We find this problem to be severe: a simple rule-based model
can achieve state-of-the-art results on both WN18 and FB15k. To ensure that
models are evaluated on datasets where simply exploiting inverse relations
cannot yield competitive results, we investigate and validate several commonly
used datasets -- deriving robust variants where necessary. We then perform
experiments on these robust datasets for our own and several previously
proposed models and find that ConvE achieves state-of-the-art Mean Reciprocal
Rank across most datasets.Comment: Extended AAAI2018 pape
Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge
Adversarial examples are inputs to machine learning models designed to cause
the model to make a mistake. They are useful for understanding the shortcomings
of machine learning models, interpreting their results, and for regularisation.
In NLP, however, most example generation strategies produce input text by using
known, pre-specified semantic transformations, requiring significant manual
effort and in-depth understanding of the problem and domain. In this paper, we
investigate the problem of automatically generating adversarial examples that
violate a set of given First-Order Logic constraints in Natural Language
Inference (NLI). We reduce the problem of identifying such adversarial examples
to a combinatorial optimisation problem, by maximising a quantity measuring the
degree of violation of such constraints and by using a language model for
generating linguistically-plausible examples. Furthermore, we propose a method
for adversarially regularising neural NLI models for incorporating background
knowledge. Our results show that, while the proposed method does not always
improve results on the SNLI and MultiNLI datasets, it significantly and
consistently increases the predictive accuracy on adversarially-crafted
datasets -- up to a 79.6% relative improvement -- while drastically reducing
the number of background knowledge violations. Furthermore, we show that
adversarial examples transfer among model architectures, and that the proposed
adversarial training procedure improves the robustness of NLI models to
adversarial examples.Comment: Accepted at the SIGNLL Conference on Computational Natural Language
Learning (CoNLL 2018
REFER: An End-to-end Rationale Extraction Framework for Explanation Regularization
Human-annotated textual explanations are becoming increasingly important in
Explainable Natural Language Processing. Rationale extraction aims to provide
faithful (i.e., reflective of the behavior of the model) and plausible (i.e.,
convincing to humans) explanations by highlighting the inputs that had the
largest impact on the prediction without compromising the performance of the
task model. In recent works, the focus of training rationale extractors was
primarily on optimizing for plausibility using human highlights, while the task
model was trained on jointly optimizing for task predictive accuracy and
faithfulness. We propose REFER, a framework that employs a differentiable
rationale extractor that allows to back-propagate through the rationale
extraction process. We analyze the impact of using human highlights during
training by jointly training the task model and the rationale extractor. In our
experiments, REFER yields significantly better results in terms of
faithfulness, plausibility, and downstream task accuracy on both
in-distribution and out-of-distribution data. On both e-SNLI and CoS-E, our
best setting produces better results in terms of composite normalized relative
gain than the previous baselines by 11% and 3%, respectively
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions
Combining discrete probability distributions and combinatorial optimization
problems with neural network components has numerous applications but poses
several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE),
a framework for end-to-end learning of models combining discrete exponential
family distributions and differentiable neural components. I-MLE is widely
applicable as it only requires the ability to compute the most probable states
and does not rely on smooth relaxations. The framework encompasses several
approaches such as perturbation-based implicit differentiation and recent
methods to differentiate through black-box combinatorial solvers. We introduce
a novel class of noise distributions for approximating marginals via
perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood
estimation when used in some recently studied learning settings that involve
combinatorial solvers. Experiments on several datasets suggest that I-MLE is
competitive with and often outperforms existing approaches which rely on
problem-specific relaxations.Comment: NeurIPS 2021 camera-ready; repo:
https://github.com/nec-research/tf-iml
Embedding Cardinality Constraints in Neural Link Predictors
Neural link predictors learn distributed representations of entities and
relations in a knowledge graph. They are remarkably powerful in the link
prediction and knowledge base completion tasks, mainly due to the learned
representations that capture important statistical dependencies in the data.
Recent works in the area have focused on either designing new scoring functions
or incorporating extra information into the learning process to improve the
representations. Yet the representations are mostly learned from the observed
links between entities, ignoring commonsense or schema knowledge associated
with the relations in the graph. A fundamental aspect of the topology of
relational data is the cardinality information, which bounds the number of
predictions given for a relation between a minimum and maximum frequency. In
this paper, we propose a new regularisation approach to incorporate relation
cardinality constraints to any existing neural link predictor without affecting
their efficiency or scalability. Our regularisation term aims to impose
boundaries on the number of predictions with high probability, thus,
structuring the embeddings space to respect commonsense cardinality assumptions
resulting in better representations. Experimental results on Freebase, WordNet
and YAGO show that, given suitable prior knowledge, the proposed method
positively impacts the predictive accuracy of downstream link prediction tasks.Comment: 8 pages, accepted at the 34th ACM/SIGAPP Symposium on Applied
Computing (SAC '19
SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations
Explaining the decisions of neural models is crucial for ensuring their
trustworthiness at deployment time. Using Natural Language Explanations (NLEs)
to justify a model's predictions has recently gained increasing interest.
However, this approach usually demands large datasets of human-written NLEs for
the ground-truth answers, which are expensive and potentially infeasible for
some applications. For models to generate high-quality NLEs when only a few
NLEs are available, the fine-tuning of Pre-trained Language Models (PLMs) in
conjunction with prompt-based learning recently emerged. However, PLMs
typically have billions of parameters, making fine-tuning expensive. We propose
SparseFit, a sparse few-shot fine-tuning strategy that leverages discrete
prompts to jointly generate predictions and NLEs. We experiment with SparseFit
on the T5 model and four datasets and compare it against state-of-the-art
parameter-efficient fine-tuning techniques. We perform automatic and human
evaluations to assess the quality of the model-generated NLEs, finding that
fine-tuning only 6.8% of the model parameters leads to competitive results for
both the task performance and the quality of the NLEs
Extrapolation in NLP
We argue that extrapolation to examples outside the training space will often
be easier for models that capture global structures, rather than just maximise
their local fit to the training data. We show that this is true for two popular
models: the Decomposable Attention Model and word2vec
- …